📂 LLM & Models
Comparisons, benchmarks and practical guides on large language models: GPT, Claude, Gemini, Llama and free alternatives.
ICML 2026 Seoul: 6,500+ papers accepted, ML enters the agentic era — key takeaways
Explore AI trends at ICML 2026 Seoul: over 6,500 accepted papers and the agentic era in machine learning.
Claude Sonnet 5: Anthropic's most agentic model, Opus performance at Sonnet price
OpenAI GPT-5.6: Sol, Terra et Luna — the model family that changes everything
Discover OpenAI GPT-5.6: Sol, Terra and Luna, the revolutionary model family under direct government control from June 26, 2026.
GPT-5.6 Sol: OpenAI launches the preview of a new model amid the early price war
Discover GPT-5.6 Sol, OpenAI's new preview shaking up the AI market amid a price war. Analysis and stakes of this launch.
Poolside Laguna M.1: the 225B open-source model for the coding agent, Apache 2.0
Discover Poolside Laguna M.1, a 225B-parameter open-source model under Apache 2.0, built to revolutionize coding agents.
FrontierCode: Cognition's benchmark that buries SWE-Bench and ranks code agents by the real quality of pull requests — Fable 5 at 46.3%, Opus 4.8 at 34.3%, GPT-5.5 at 25.5%
Discover FrontierCode, Cognition's new benchmark replacing SWE-Bench by evaluating the real quality of code agents' pull requests.
DeepSWE: the benchmark proving that code agents were cheating — Artificial Analysis buries SWE-Bench
Discover DeepSWE, the new benchmark replacing SWE-Bench, proving code agents were cheating. Analysis of the rankings upended by Artificial Anal
Gemini 3.5 Pro: countdown — 10 days before Google's deadline, 2 million tokens and Deep Think mode, the most anticipated model of the year (amidst a talent chaos)
Gemini 3.5 Pro: 10 days before Google's deadline, discover the rumors about its 2 million tokens and Deep Think mode amid a talent chaos.
GLM-5.2: The most powerful open weights model in the world — 753B MoE, 1M context, MIT license, the LLM landscape shifts
Discover GLM-5.2 from Z.ai: the world's most powerful open weights model. 753B MoE, 1M context & MIT license shaking up the LLM landscape.
CacheRL: A Qwen3-4B model achieves 92% accuracy in tool-calling with 100 times less compute than GPT-5
Discover CacheRL: a Qwen3-4B model hits 92% tool-calling accuracy with 100x less compute than GPT-5. AI revolution!
Best LLM Code (June 2026)
Discover the ultimate comparison of the best coding LLMs in June 2026. Analysis of agentic models capable of coding without human supervision.
Best Local LLMs (June 2026)
Discover the final ranking of the best local LLMs in June 2026. DeepSeek V4 Pro, Ollama: compare quality and privacy.
Kimi K2.7-Code : the 1T parameter open-source coding model that cuts 30% of reasoning tokens and beats Opus in tool use
Discover Kimi K2.7-Code, a 1T-parameter open-source coding model cutting reasoning tokens by 30% and outperforming Opus in tool use.
DeepSeek V4-Pro : the permanent 75% price drop accelerating the LLM war
DeepSeek V4-Pro permanently drops its price by 75%. Discover how this LLM model disrupts the market and accelerates the AI war.
Qwen3 Coder Next : the open-source model that runs on a 64 GB Mac and beats DeepSeek in coding
Discover Qwen3 Coder Next, the open-source model running on a 64GB Mac and beating DeepSeek at coding. A revolution for local code!
DiffusionGemma : Google releases the first open source diffusion text model — 4x faster than autoregressive
Discover DiffusionGemma: Google's first open-source diffusion text model, 4x faster than classic autoregressive approaches.
Best LLMs (June 2026)
Discover the full June 2026 best LLM ranking after the GPT-5.5 release. Compare autonomous AI models and their reasoning.
Claude Fable 5: Anthropic makes its Mythos model accessible to the public
Anthropic launches Claude Fable 5, the first public version of its Mythos model. Discover this model deemed too powerful and its explosive scores.
Best Free Llms (June 2026)
Discover the ranking of the best free LLMs in June 2026. Market analysis and comparison of uncensored AI models.
DeepSeek's DeepEP: the open source lib that optimizes GPU communication for large-scale MoE models
DeepSeek releases DeepEP, an open-source library that optimizes GPU communication to accelerate large-scale MoE model training.
NVIDIA Nemotron 3 Ultra 550B: The most powerful open-source model in the US arrives at Computex
Discover NVIDIA Nemotron 3 Ultra 550B, the most powerful US open-source model unveiled at Computex 2026 to rival China.
MiniMax M3: the Chinese open-weights model defying GPT-5.5 with 1M context and MSA architecture
Discover MiniMax M3, the Chinese open-weights model challenging GPT-5.5. It offers 1 million context tokens via MSA architecture.
DeepSeek V3.1: the silent revolution of open source arrives under the MIT license
DeepSeek V3.1 disrupts open source AI with a 671B parameter model under MIT license, with zero commercial restrictions.
Claude Opus 4.8: the model that dethrones GPT-5.5 — benchmarks, Dynamic Workflows, and the future of the coding agent
Anthropic's Claude Opus 4.8 dethrones GPT-5.5. Discover its benchmarks, the Dynamic Workflows system, and the coding agent revolution.
GPIC : Stanford releases 28 trillion pixels to train image generation models
Stanford releases GPIC, a 28-trillion-pixel dataset for training image generation models. Discover this permissive dataset.
LLMSurgeon: this ACL 2026 paper opens the black box of LLM pre-training
Discover LLMSurgeon, the ACL 2026 paper that opens the LLM pre-training black box to reveal their secret data mix.
Qwen3-Coder-Next : 80B MoE with 3B active, the open-source code agent that rivals Claude Sonnet
Discover Qwen3-Coder-Next: an 80B MoE (3B active) open-source code model rivaling Claude Sonnet on SWE-Bench.
OSCAR: Together AI open-sources a 2-bit KV cache quantization that reduces memory by 8x
Discover OSCAR: Together AI's open-source 2-bit KV cache quantization that cuts memory by 8x and optimizes LLM serving.
Stanford AI Index 2026 : the 5 figures that show AI has passed a point of no return
Discover the Stanford AI Index 2026 and 5 key figures proving AI has crossed a point of no return.
Gated DeltaNet-2 : the Yejin Choi paper that solves the oldest problem of linear attention
Discover Gated DeltaNet-2, Yejin Choi's paper that finally solves the oldest problem of linear attention in AI models.
Cursor Composer 2.5: The coding model that rivals Opus 4.7 at a tenth of the price
Discover Cursor Composer 2.5, a coding model rivaling Claude Opus 4.7 at a tenth of the price. AI price war analysis.
DeepWeb-Bench: The new benchmark that exposes the weaknesses of AI search agents
Discover DeepWeb-Bench, the new benchmark proving AI search agent scores are inflated and exposing their true weaknesses.
Gemini 3.5 Flash : the fast model that beats Opus 4.7 and GPT-5.5 on agent benchmarks — 289 tokens/second
Discover Gemini 3.5 Flash: the ultra-fast model at 289 tokens/sec beating Claude Opus 4.7 and GPT-5.5 on agent benchmarks.
General Preference RL: this paper unifies reinforcement learning and preference optimization for LLMs
Discover the General Preference RL paper unifying reinforcement learning and preference optimization to solve LLM post-training.
OpenAI Parameter Golf: The challenge that proves small models are the future of AI
Discover the OpenAI Parameter Golf challenge: why compressing an LLM into 16 MB proves small models are the future of AI.
Meta Muse Spark: why Meta betrayed open-source — the first closed model from the Superintelligence Lab
Discover why Meta Muse Spark is a turning point: the first closed model from the Superintelligence Lab that betrays Meta's open-source promise.
MeMo : Memory as a Model — memory as an autonomous model for updating LLMs without retraining
Discover MeMo (Memory as a Model): the innovative solution to update LLMs without retraining and defeat knowledge obsolescence.
SDAR: how to train AI agents with reinforcement learning without breaking them — self-distillation agentic
Discover SDAR (Self-Distillation Agentic Reinforcement): the method to train your AI agents with reinforcement learning without breaking them.
OpenDeepThink : Bradley-Terry comparison-based parallel reasoning changes the game for LLM inference
Discover OpenDeepThink: how Bradley-Terry comparison parallel reasoning revolutionizes LLM inference and outperforms sequential chain-of-thought
Negation Neglect : when fine-tuning makes LLMs blind to the false
Discover the Negation Neglect phenomenon: how fine-tuning LLMs against fake news ends up making them blind to falsehoods.
KV-Fold : The training-free trick that revolutionizes long-context inference in LLMs
Discover KV-Fold, the training-free trick revolutionizing LLM long-context inference and solving the token management nightmare.
Attractor Models: the new architecture that beats Transformers at reasoning
Discover Attractor Models, the new AI architecture that outperforms Transformers on reasoning at equivalent parameters.
Translate this title to English: UniPool : the newcomer in MoE architectures decouples network depth from expert growth
Discover UniPool, the innovation revolutionizing MoE architectures by decoupling network depth from expert growth.
Best Free Llms (May 2026)
Discover the best free LLMs of May 2026. Our comparison decides to find the ideal open source or freemium AI without paying.
VaultGemma: Google DeepMind releases the world's most powerful differentially private LLM
Discover VaultGemma, the world's most powerful differentially private LLM by Google DeepMind. Mathematical guarantees for your data.
Subquadratic stealth sort with SubQ: 12 million context tokens, the end of quadratic attention?
Subquadratic unveils SubQ: a revolutionary AI model handling 12M context tokens and ending quadratic attention.
Tokens, context, costs: understanding LLM billing
Understand LLM billing: tokens, context window, cost calculation & 2026 price comparison chart. 12 tips to cut your expenses.
Claude, GPT, Gemini, Llama: Which Model to Choose in 2026?
Choosing a language model (LLM) in 2026 is a bit like choosing a car: there’s no universal "best"—only the best for you. Between Anthropic’s Claude, OpenAI’s...
SigLoMa: a quadruped robot that learns manipulation in the real world using vision alone
Meet SigLoMa, a revolutionary quadruped robot that learns real-world manipulation tasks using vision alone. Explore the future of robotics.
Qwen3.6: Alibaba arrives with a new family of LLM models
Discover Qwen3.6, Alibaba's new LLM family. With its MoT architecture (35B-A3B), rival GPT-4 at a lower cost. Deployment guide inc